Collecting Statistical Information on Noun-Adjective Multiword Expressions for Extracting the Noun-Noun Ones
نویسنده
چکیده
Extraction and validation of multiword expressions (MWEs) thanks to association measures is a very common method. Researchers use it for extraction of MWEs of different lengths or different syntactic structures. However, we never wonder if the association measures collected on an MWE with one particular structure (e.g. Verb-Preposition) is relevant for collecting an MWE with a different one (e.g. Adjective-Noun). In this article we use Noun-Adjective MWEs association measures for training a model that is evaluated on the task of Noun-Noun MWEs extraction and validation. We work on the French part of the Europarl corpus and use the dictionary Dela as our gold standard. At the end we show that trained on the same text the model tuned with Noun-Adjective candidates is better than the model tuned on Noun-Noun candidates for extracting Noun-Noun MWEs.
منابع مشابه
Modeling the Statistical Idiosyncrasy of Multiword Expressions
The focus of this work is statistical idiosyncrasy (or collocational weight) as a discriminant property of multiword expressions. We formalize and model this property, compile a 2-class dataset of MWE and non-MWE examples, and evaluate our models on this dataset. We present a possible empirical implementation of collocational weight and study its effects on identification and extraction of MWEs...
متن کاملA System for Compound Noun Multiword Expression Extraction for Hindi
Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...
متن کاملGrammatical and phonological influences on word order.
During the grammatical encoding of spoken multiword utterances, various kinds of information must be used to determine the order of words. For example, whereas in adjective-noun utterances like "red car," word order can be determined on the basis of the word's grammatical class information, in noun-noun utterances like "... by car, bus, or ...," word order cannot be determined on the basis of a...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملSyntax and Semantics vs. Statistics for Italian Multiword Expressions: Empirical Prototypes and Extraction Strategies
In this work we present an empirical analysis performed on Italian nominal multiword expressions (MWEs) of the form [noun + adjective] that aims at studying quantitatively their syntactic and semantic features in order to improve their automatic identification and collection. Three indices are proposed, which are able to measure syntactic and semantic frozeness of the expressions on empirical b...
متن کامل